Producing scalable performance with OpenMP: Experiments with two CFD applications

نویسندگان

  • Jay Hoeflinger
  • Prasad Alavilli
  • Thomas Jackson
  • Bob Kuhn
چکیده

OpenMP is a relatively new programming paradigm, which can easily deliver good parallel performance for small numbers (<16) of processors. Success with more processors is more difficult to produce. MPI is a relatively mature programming paradigm, and there have been many reports of highly-scalable MPI codes for large numbers (hundreds, even thousands) of processors. In this paper, we explore the causes of poor scalability with OpenMP from two points of view. First, we incrementally transform the loops in a combustion application until we achieve reasonably good parallel scalability, and chronicle the effect of each step. Then, we approach scalability from the other direction by transforming a highly scalable program simulating the core flow of a solid fuel rocket engine (originally written with MPI calls), directly to OpenMP, and report the barriers to scalability that were detected . The list of incremental transformations includes well-known techniques such as loop interchange and loop fusion, plus new ones which make use of the unique features of OpenMP, such as barrier removal and the use of ordered serial loops. The list of barriers to scalability includes the use of the ALLOCATE statement within a parallel region, as well as the lack of a reduction clause for a PARALLEL region in OpenMP. We conclude with a list of key issues which need to be addressed to make OpenMP a more easily scalable paradigm. Some of these are OpenMP implementation issues; some are language issues.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Porting and performance evaluation of irregular codes using OpenMP

In the last two years, OpenMP has been gaining popularity as a standard for developing portable shared memory parallel programs. With the improvements in centralized shared memory technologies and the emergence of distributed shared memory (DSM) architectures, several medium-to-large physical and logical shared memory con gurations are now available. Thus, OpenMP stands to be a promising medium...

متن کامل

Performance of a new CFD flow solver using a hybrid programming paradigm

This paper presents several algorithmic innovations and a hybrid programming style that lead to highly scalable performance using shared memory for a new computational fluid dynamics flow solver. This hybrid model is then converted to a strict message-passing implementation, and performance results for the two are compared. Results show that using this hybrid approach our OpenMP implementation ...

متن کامل

Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors

With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of exploiting the new levels of parallelism that are exposed in modern high-performance computers. A typical approach to this is to use shared-memory programming...

متن کامل

Performance Modeling of Intel and Portland Compilers Using

In recent years, we have witnessed a growing interest in optimizing the parallel and distributed computing solutions using scaled-out hardware designs and scalable parallel programming paradigms. This interest is driven by the fact that the microchip technology is gradually reaching its physical limitations in terms of heat dissipation and power consumption. Therefore and as an extension to Moo...

متن کامل

Cube -user Manual Generic Display for Application Performance Data

CUBE is a generic presentation component suitable for displaying a wide variety of performance metrics for parallel programs including MPI and OpenMP applications. Program performance is represented in a multi-dimensional space including various program and system resources. The tool allows the interactive exploration of this space in a scalable fashion and browsing the different kinds of perfo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Parallel Computing

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2001